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Abstract 

Rapid development in numerical modelling of materials and the complexity of new models in¬ 
creases quickly together with their computational demands. Despite the growing performance of 
modern computers and clusters, calibration of such models from noisy experimental data remains 
a nontrivial and often computationally exhaustive task. The layered neural networks thus repre¬ 
sent a robust and efficient technique to overcome the time-consuming simulations of a calibrated 
model. The potential of neural networks consists in simple implementation and high versatility 
in approximating nonlinear relationships. Therefore, there were several approaches proposed to 
accelerate the calibration of nonlinear models by neural networks. This contribution reviews and 
compares three possible strategies based on approximating (i) model response, (ii) inverse rela¬ 
tionship between the model response and its parameters and (iii) error function quantifying how 
well the model fits the data. The advantages and drawbacks of particular strategies are demon¬ 
strated on the calibration of four parameters of the affinity hydration model from simulated data 
as well as from experimental measurements. This model is highly nonlinear, but computationally 
cheap thus allowing its calibration without any approximation and better quantification of results 
obtained by the examined calibration strategies. The paper can be thus viewed as a guide in¬ 
tended for the engineers to help them select an appropriate strategy in their particular calibration 
problems. 
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1. Introduction 

Development in numerical modelling provides the possibility to describe complex phenom¬ 
ena in material or structural behaviour. The resulting models are, however, often highly nonlinear 
and defined by many parameters, which have to be estimated so as to properly describe the inves¬ 
tigated system and its behaviour. The aim of the model calibration is thus to rediscover unknown 
parameters knowing the experimentally obtained response of a system to the given excitations. 
The principal difficulty of model calibration is related to the fact that while the numerical model 
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of an experiment represents a well-defined mapping from input (model, material, structural, or 
other parameters) to output (structural response), there is no guarantee that the inverse relation 
even exists. 

The most broadly used approach to parameter identification is usually done by means of 
an error minimisation technique, where the distance between parameterised model predictions 
and observed data is minimised [|lj]. Since the inverse relation (mapping of model outputs to 
its inputs) is often ill-posed, the error minimisation technique leads to a difficult optimisation 
problem, which is highly nonlinear and multi-modal. Therefore, the choice of an appropriate 
identification strategy is not trivial. 

Another approach intensively developed during the last decade is based on Bayesian updat¬ 
ing of uncertainty in parameters’ description Sll . The uncertainty in observations is expressed 
by corresponding probability distribution and employed for estimation of the so-called posterior 
probabilistic description of identified parameters together with the prior expert knowledge about 
the parameter values 01. The unknown parameters are thus modelled as random variables 
originally endowed with prior expert-based probability density functions which are then updated 
using the observations to the posterior density functions. While the error minimisation tech¬ 
niques lead to a single point estimate of parameters’ values, the result of Bayesian inference is 
a probability distribution that summarizes all available information about the parameters. An¬ 
other very important advantage of Bayesian inference consists in treating the inverse problem as 
a well-posed problem in an expanded stochastic space. 

Despite the progress in uncertainty quantification methods 01 , more information provided 
by Bayesian inference is generally related to more time-consuming computations. In many sit¬ 
uations, the single point estimate approach remains the only feasible one and development of 
efficient tools suitable for this strategy is still an actual topic. Within the several last decades, 
a lot of attention was paid to the so-called intelligent methods of information processing and 
among them especially to soft computing methods such as artificial neural networks (ANNs), 
evolutionary algorithms or fuzzy systems 0. A review of soft computing methods for parame¬ 
ter identification can be found e.g. in 0- In this paper, we focus on applications of ANNs in the 
single point approach to parameter identification. In particular, we elaborate our previous work 
presented in lilOl II ill with the goal to present a detail and comprehensive comparison of three 
different strategies of ANNs’ usage in parameter identification problems. 

Next section briefly recall the basics of ANNs. Classification of ANNs’ different applica¬ 
tions in calibration problems is introduced in Section [3] and description of illustrative example 
- affinity hydration model for concrete - follows in Section [4] In the context of this particular 
example, the calibration strategies are then described in detail in five sections starting by training 
data preparation and sensitivity analysis in Section 0 Neural network inputs and outputs in par¬ 
ticular strategies are discussed in Section [6] and training with topology are described in Section 
[7] Verification and validation on simulated and experimental data are summarized in sections [8] 
and|9j respectively. Finally, the results are concluded in Section ITOl 


2. Artificial neural network 

Artificial neural networks (ANNs) JT3Q are powerful computational systems consisting 
of many simple processing elements - so-called neurons - connected together to perform tasks in 
an analogy to biological brains. Their main feature is the ability to change their behaviour based 
on external information that flows through the ANN during the learning (training) phase. 
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A particular type of ANN is the so-called feedforward neural network, which consists of 
neurons organized into layers where outputs from one layer are used as inputs into the following 
layer, see FigureQ] There are no cycles or loops in the network, no feed-back connections. Most 
frequently used example is the multi-layer perceptron (MLP) with a sigmoid transfer function 
and a gradient descent method of training called the back-propagation learning algorithm. In 
practical usage, the MLPs are known for their ability to approximate nonlinear relations and 
therefore, when speaking about an ANN, the MLP is considered in the following text. 



The input layer represents a vector of input parameters which are directly the outputs of the 
input layer. The outputs o,_ | j, of the (i - l)-th layer are multiplied by a vector of constants Wi, h k, 
the so-called synaptic weights, summarized and used as inputs Ujj into the /'-th neuron in the 
following /-th layer. Elements in the hidden and output layers - neurons - are defined by an 
activation function f a (uij ), which is applied on the input and produces the output value of the 
y'-th neuron in the /-th layer, i.e. 

K 

Oij = fa (m;j) where u Lj = ^ . (1) 

k =0 


The synaptic weights Wjj^ are parameters of an ANN to be determined during the training pro¬ 
cess. K is the number of neurons in the / - 1 layer. The type of the activation function is usually 
chosen in accordance with the type of a function to be approximated. In the case of continuous 
problems, the sigmoid activation function given as 


°Uj ~ fa ( Ll i,j ) — j 


+ e 


( 2 ) 


is the most common choice. 

One bias neuron is usually added into the input and hidden layers. It does not contain an 
activation function, but only a constant value. Its role is to enable to shift the value of a sum over 
the outputs of his neighbouring neurons before this sum enters as the input into the neurons in 
the following layer. The value of biases is determined by the training process together with the 
synaptic weights. 

Despite of ANN’s popularity there are only few recommendations for the choice of ANN’s 
architecture. The authors, e.g. in 00, show that the ANN with any of a wide variety of 
continuous nonlinear hidden-layer activation functions and one hidden layer with an arbitrarily 
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large number of units suffices for the ’’universal approximation” property. Therefore, we limit 
our numerical experiments to such case. The number of units in the input and the output layer is 
usually given by the studied problem itself, but there is no theory yet specifying the number of 
units in the hidden layer. On one hand, too small number of hidden units leads to large prediction 
errors. On the other hand, a large number of hidden units may cause the so-called overfitting, 
where the ANN provides precise outputs for the training samples, but fails in case of unseen 
samples. In such a situation, the ANN tries to fit the training data despite increasing oscillations 
in the intermediate space. 

To overcome this problem, some model selection technique GS has to be applied in order to 
perform a guided choice of the ANN’s topology. Recent approaches encompass e.g. growing- 
pruning methods (see e.g. IH or more complex techniques designed for optimisation of the 
ANN’s topology such as meta-learning II1 8L 1 1 9ll . Here we employ simple and general strategy to 
evaluate a particular ANN’s topology: the cross-validation, because it does not involve any prob¬ 
abilistic assumptions or dependencies on an identification problem. The idea of cross-validation 
is based on a repeated ANN’s prediction error evaluation for a chosen subset of training data and 
selection of the ANN with the smallest averaged prediction errors. Comparing to the well-known 
model validation on some independent set of data, the advantage of cross-validation consists in 
better behaviour on smaller data sets, where independent data set cannot be afforded & 

Before applying the ANN to any engineering problem, one has to resolve also several ques¬ 
tions regarding the training data preparation. It involves not only the transformation of input 
and output data into the range of the activation functions. In simulation problems, where the 
ANN is applied to mimic some unknown relationship between observed quantities, the training 
data coincide with the measured data. In inverse problems, we already have some theoretical 
model relating those quantities and we train the ANN on simulated data, see a recent review of 
ANN’s application in structural mechanics 12111 . Preparation of a suitable training set becomes 
another nontrivial task, where sensitivity analysis plays an important role. For a sake of clarity, 
we address these topics in more detail in Section[5]in context with a particular model for cement 
hydration. 


3. Strategies for application of ANN in model calibration 

In model calibration, the goal is to find a set of model parameters minimising the difference 
between the model response and experimental measurements, see Figure [2] An intuitive way 



Forward model approximation 


Figure 2: Scheme of model calibration procedure. 


Error function 
approximation 


of solving the calibration problem is to formulate an error function quantifying this difference 
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and to minimise the error function using some optimisation algorithm. The most common error 
functions are given as 


n r 

F\ = Yjin-dd 2 , 

i=\ 

(3) 

n r 

F 2 = Yj ^ ~ d '\ ’ 

i=l 

(4) 


where r, is the z-th component of the model response corresponding to the z-th measured quantity 
di and ,V R is a number of measured quantities. The difficulty arises from the nonlinear relation 
between the model response and model parameters often causing complexity of the error function 
such as multi-modality or non-differentiability. Therefore, the computationally efficient methods 
based on analytically or numerically obtained gradient can be applied only in specific cases. 

A more general approach is to apply an optimisation algorithm which can handle the multi¬ 
modality once furnished by a sufficient number of function evaluations. However, one evaluation 
of an error function always involves a simulation of the model. Even for the relatively fast 
model simulation, the optimisation can become easily unfeasible because of the huge number 
of function evaluations commonly needed by evolutionary algorithms, even though they usually 
need less simulations than uncertainty based methods mentioned in the introductory part of the 
paper. 

One way of reducing the number of model simulations is to construct a forward model 
approximation based e.g. on an ANN. The error function minimisation then becomes a minimi¬ 
sation of distance between the ANN’s predictions and experimental data. The efficiency of this 
strategy relies on the evaluation of the trained ANN to be significantly much faster than the full 
model simulation. The advantage of this strategy is that the ANN is used to approximate a known 
mapping which certainly exists and is well-posed. Computational costs of this strategy are sep¬ 
arated in two parts of a similar size: (i) the ANN training - optimisation of synaptic weights 
and (ii) the minimisation of an error in the ANN prediction for experimental data - optimisation 
of ANN inputs (i.e. determination of investigated model parameters). An important shortcom¬ 
ing of this method is that this ill-posed optimisation problem needs to be solved repeatedly for 
any new experimental measurement. This way of ANN application to the parameter identifi¬ 
cation was presented e.g. in [22], where an ANN is used for predicting load-deflection curves 
and the conjugate directions algorithm is then applied for optimisation of ductile damage and 
fracture parameters. Authors in ||23l train an ANN to approximate the results of FE simulations 
of jet-grouted columns and optimise the column radius and a cement content of the columns 
by a genetic algorithm. Principally same methods are used for identification of elasto-plastic 
parameters in [24]. 

One more difficulty of the forward model approximation concerns the number of parameters 
and response components. It is very common that the experimental observations are represented 
by a discretised curves or surfaces in time or space dimensions being defined as a vectors with 
a large number of components. A forward model then represents a mapping from usually low¬ 
dimensional parameter space to high-dimensional response space. Although this mapping is 
well-posed, the surrogate model must have a large number of outputs or the time and/or space 
dimensions have to be included among the model inputs. 

Another way of avoiding the mapping to a large number of outputs is to construct the error 
function approximation, where the model parameters are mapped onto only one scalar value. 
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One important inconvenience of such strategy is of course the complexity of the error function, 
which can be, as mentioned above, highly nonlinear, multi-modal and/or non-smooth. Higher 
complexity of the approximated relation leads to a higher number of simulations needed for the 
construction of the approximation. This concerns another problem of estimation and choice of 
an appropriate design of experiments, i.e. sets of parameters, to perform the simulations which 
will enable to build up the surrogate with a relatively small error. This problem can be reduced 
by adaptive addition of design points, i.e. new simulations, close to the minimum of the error 
function approximation. The result of the new simulation is then used for an improvement of the 
surrogate and a new optimisation process is run again. Such an approach is usually well suited for 
surrogates based on kriging or radial basis function networks [251,1261]. In this paper, we limit our 
attention to application of feedforward layered neural networks and thus, we investigated their 
ability to approximate the error function with a limited number of simulations in non-adaptive 
fashion. 

While the strategy of the forward model approximation involves a new optimisation process 
for any new data, the strategy of the error function approximation involves not only the optimi¬ 
sation process, but also the surrogate model construction. Regarding this aspect, the most conve¬ 
nient strategy is the inverse relation approximation, which needs only one evaluation to furnish 
the parameters corresponding to new observations. Of course, by the new observations we mean 
observations of the system with different properties but performed under the same external con¬ 
ditions (e.g. a different material, but the same geometry of the specimen and loading conditions). 
The strategy of the inverse relation approximation assumes the existence of an inverse relation¬ 
ship between the outputs and the inputs of the calibrated model. If such a relationship exists 
at least on a specified domain of parameters’ values, it can be approximated by an ANN. Here 
the ANN’s training process is responsible for all computational costs arising from a solution of 
the ill-posed problem. This way of the ANN’s application to parameter identification was pre¬ 
sented e.g. in |27| or recently in [28]] for identification of mechanical material parameters, in [29 ] 
for estimation of elastic modulus of the interface tissue on dental implants surfaces, in [30] for 
identification of interfacial heat transfer coefficient or in EH for determination of geometrical 
parameters of circular arches. 

In order to illustrate the advantages and disadvantages of the outlined strategies of the ANN’s 
application to model calibration, we have chosen a computationally simple but nonlinear affinity 
hydration model briefly described in the following section. The model was successfully vali¬ 
dated on Portland cements in [ 32] and thus allows us to also validate the described identification 
strategies on experimental data as summarized in Section[9] 


4. Affinity hydration model 

Affinity hydration models provide a framework for accommodating all stages of cement hy¬ 
dration. We consider hydrating cement under isothermal temperature 25°C. At this temperature, 
the rate of hydration can be expressed by the chemical affinity A 2 s(a) under isothermal 25°C 

da 

— = A 25 (a), (5) 

dr 

where the chemical affinity has a dimension of time -1 and a stands for the degree of hydration. 

The affinity for isothermal temperature can be obtained experimentally; isothermal calorime¬ 
try measures a heat flow q{t) which gives the hydration heat Q(t) after integration. The approxi- 

6 







mation is given 


m 

Qpot 

1 dQ(t) 

Qpot tit 


a, 


q(t) 

Qpot 


da 

dr 


= A 2 s(a), 


( 6 ) 

(7) 


where Q pot is expressed in J/g of cement paste. Hence the normalized heat flow under 
isothermal 25°C equals to chemical affinity A 25 (a). 

Cervera et al. [33] proposed an analytical form of the normalized affinity which was refined 
in [34]. Here we use a slightly modified formulation [1351: 


A 25 (a) = B x 



(a oo - a) exp 



( 8 ) 


where B i, Bo are coefficients related to chemical composition, a al is the ultimate hydration degree 
and 1) represents microdiffusion of free water through formed hydrates. 

When hydration proceeds under varying temperature, maturity principle expressed via Ar¬ 
rhenius equation scales the affinity to arbitrary temperature T 


At 


= Ms exp 


E a t 1 
R \273.15 + 25 



(9) 


where R is the universal gas constant (8.314 Jmol 1 K 1 ) and E a [Jmol ! ] is the activation energy. 
For example, simulating isothermal hydration at 35°C means scaling A 25 with a factor of 1.651 at 
a given time. This means that hydrating concrete for 10 hours at 35°C releases the same amount 
of heat as concrete hydrating for 16.51 hours under 25°C. Note that setting E a — 0 ignores the 
effect of temperature and proceeds the hydration under 25°C. The evolution of a is obtained 
through numerical integration since there is no analytical exact solution. 


5. Sensitivity analysis and training data preparation 


Since the ANN’s training process requires a preparation of a training data set, it is also worthy 
to use these data for a sampling-based sensitivity analysis 136. 37] and obtain some information 


about importance of particular observations or significance of each parameter for a system be¬ 
haviour. To achieve some reliable information from sensitivity analysis as well as a good approx¬ 
imation by an ANN, one has to choose the training data carefully according to a suitable design 
of experiments, see e.g. [ 38j] for a competitive comparison of several experimental designs. 

As the model parameters are defined on various intervals, they need to be transformed into 
standardised parameters, e.g. p, £ [0; 1], defined on the intervals suitable for chosen activation 
functions. When the bounds for a parameter vary in orders, it can typically suggest highly nonlin¬ 
ear relationship with model response. At this moment, any expert knowledge about the parameter 
meaning can be employed to decrease that nonlinearity by introduction of nonlinear transforma¬ 
tion to standardised parameter. It is demonstrated on parameter B 2 in Table [T| where bounds for 
the affinity model parameters together with their relations to the standardised parameters p , are 
listed. 

The affinity hydration model was chosen not only for its nonlinearity, but especially for its 
relatively simple interpretation and computationally fast simulation. Hence, we assume that the 
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Parameter 

Minimum 

Maximum 

Relation 

[h~ l \ 

0.1 

1 

Pi = (B\ - 0.1)/0.9 

Bn [-] 

10~ 6 

10~ 3 

p 2 = (log Bn + 6)/3 

/?[-] 

2 

12 

p 3 = (77 - 2)/10 

Q-oo [ ] 

0.7 

1.0 

P4 = (ctea - 0.7)/0.3 


Table 1: Bounds for affinity model parameters. 


B\ = {0.1,0.2,..1} Bn = {10~ 6 ,11.2- 5 ,..., 10 3 } 



Figure 3: Influence of model parameters to model response a. 


model is eligible to illustrate typical features of particular identification strategies. In order to 
understand the influence of the model parameters to its response more deeply. Figure [3] demon¬ 
strates the changes of the response induced by changes in a chosen parameter while other param¬ 
eters are fixed. On the other hand, to illustrate the spread of the model response corresponding 
to the parameters varying within the given domain, we prepare a design of experiments (DoE) 
having /V DoH = 100 samples in the space of standardised parameters. The DoE is generated 
as Latin Hypercube Sampling optimised with respect to the modified Ln discrepancy. Such an 
experimental design has a good space-filling property and is nearly orthogonal [38|]. For each 
design point we perform a model simulation to obtain a bundle of ,V| Jo h curves for the degree of 
hydration a(t ), see Figure @Ji. 
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Figure 4: Bundle of degree of hydration curves obtained for design points (a) and sensitivity 
analysis for input-output pairs (b). 



Since the model response is represented by the degree of hydration being a function of the 
time, the time domain is discretised into 1161 steps uniformly distributed with the logarithm of 
the time. Hence, the model input vector p = (p\ , pi, P 3 , P 4 ) consists of 4 parameters and the 
output vector a = (a i,..., (r,v llnll .) consists of Mime = 1161 components. In order to quantify the 
influence of the model parameters to particular response components, we evaluate Spearman’s 
rank correlation coefficient p for each (p, , or,-) pair using all the i e {1,..., Mdoe) simulations. The 
results of such a sampling-based sensitivity analysis [36] are plotted in Figure [4]i. 

In the inverse mode of identification, the model output vector a consisting of Mime = 1161 
components is too large for usage as an input vector for the ANN. Hence, we performed the 
principal component analysis (PCA) in order to reduce this number to Mpca = 100 components 
a = (a i,..., do) with non-zero variance (this number is related to the number of simulations 
involved in PCA, i.e. Mpca = Md 0 e)- The components are ordered according to their relative 
variance, see Figure [5ji for the nine most important ones. Resulting principal components are 



Qt-1 QL 2 0.3 OL4 OL 5 OL 6 OL'j C^8 &9 

Principal components 
(a) 



-0.5 


Q20 040 060 080 Oioo 

Principal components 

(b) 


Figure 5: Variance explained by the first nine principal components (a) and sensitivity analysis 
for model inputs p, - principal components or, (b). 
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technically new quantities obtained by a linear combination of the original model outputs a = 
A(a). This transformation has of course an influence to sensitivity analysis and thus we computed 
correlations between the model inputs p , and principal components dr,-, see Figure^. 

6. Implementation of approximation strategies 

Results of the described simulations are also used as training simulations for ANNs, i.e. 
■©train = {(pi, o,) \ i G {1,2,..., Mrainh Strain = Ad 0 e = 100}. Particular approximation strategies, 
however, process the training simulations in a different way. 

The strategy of the forward model approximation can be formulated in two ways, which 
differ in handling the high dimensionality of the model output a. In the first formulation, we can 
consider the time step t k as the fifth model parameter (i.e. the fifth model input) and thus the 
model output reduces into only one scalar value of the hydration degree a k corresponding to the 
given time t k . As the objective of the ANN is thus to span the parameter as well as the time space, 
we called this strategy as Forward Complex (ForwComp). In such a configuration, the results 
of At ra in training simulations turn into Attain x Ati me = 116,100 training samples. Evaluation of so 
many samples at every iteration of the ANN’s training process is, however, very time-consuming. 
Therefore, only every m-th time step is included for ANN training and thus the training set is 
given as 2)|; o a ™ Comp = {((p,-,4W)|/ G {1, 2,... , Attain}, k e {1,1 + m, 1 + 2m,. .., A tlm e}}. In 
our particular implementation, we selected m = 10 leading to |©^°™ Comp | = 11,700 samples. 
Note that in all other strategies, the number of training samples equals the number of training 
simulations, see Table [2] where the significant parameters of particular approximation strategies 
are briefly summarised. 


Strategy 

Aann 

Inputs 

Outputs 

1 ©train 1 

Forward Complex 

1 

Pi. P2. P3, P4. 4 

Or* 1* G {1, 11.1161} 

11700 

Forward Split 

9 

Pi. P2. P3- P4 

0 - 300 ; <* 400 ; ■ • ■; 04100 

100 

Forward Split II 

22 

Pi. P2. P3- P4 

0 100 ; <* 150 ;... joiiso 

100 

Forward Split III 

43 

Pi. P2, P3. P4 

Q'loo; 0430 ; < 3450 ; 0470 ;...; 01150 

100 

Error F\ 

1 

Pl,P2, P3-P4 

F 1 

100 

Error F2 

1 

P1.P2, P3.P4 

F 2 

100 

Inverse Expert 

4 

a'300.0'400, ■ • • .Q'lioo 

Pi; p 2 ; py, P4 

100 

Inverse Expert II 

4 

a'200,0'300. • • - ,<*1100 

pi; P 2 ; py P4 

100 

Inverse PCA 

4 

O'l, &2, . . ., &9 

pi; py py P4 

100 


Table 2: Parameters of approximation strategies 


The second way of the model output approximation is based on training an independent 
ANN for every time step t k . Here, the particular ANN approximates simpler relation and span 
only the parameter space. A training data set for ANN approximating the response component 
a k is thus given as £) p °™ Spll “* = {( Ph ajk ) | / g {1,2,..., 100}} having only |©^°™ Sph ' Q ' i j = 100 
samples. A disadvantage of such an approach consists in training a large number Aann of smaller 
ANNs. As training of Aann = A t im e = 1161 different ANNs can be almost unfeasible, we select 
only a few of the time steps, where the approximation is constructed and thus, the model output 
approximation is more rough. The choice of the important time steps and their number can be 
driven by the expert knowledge or results of the sensitivity analysis. Hence, we present three 
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different choices so as to illustrate its influence, see Table [2] We further call these strategies as 
Forward Split (ForwSpli), Forward Split II (ForwSplill) and Forward Split III (ForwSplilll). 

The error function approximation is the only strategy where the high dimensionality of the 
model output does not impose any complications. The model output is used for evaluation of 
the error function and the ANN is trained to approximate the mapping from the parameter space 
to a single scalar value of the error function, i.e. = {(p,, F a ) \ i e {1,2,..., A tl - a i n }} 

and |£T™ r ’ F "| = 100, where F a stands for a chosen error function. As we already mentioned 
in Section [3] there are two very common error functions given by Eqs. (0 and (0} and thus 
we investigate both considering the two strategies further called as Error F i and Error Fi, 
respectively. 

In case of the inverse relation approximation, the high dimensionality of the model output 
needs again some special treatment so as to keep the number of ANN inputs and thus the ANN 
complexity reasonable. An intuitive approach is a simple selection of a limited number of out¬ 
put values a = A (a). Here, one ANN is trained to predict one model parameter pj and thus 

®todn XP ’^ = [(Oi,Pij)\i e {1,2,..., Attain}} and = 100. A particular choice of com¬ 

ponents in the vector a, defined by the operator A should take into account not only the results 
of sensitivity analysis, but also a possible measurement error in experimental data as well as any 
other expert knowledge. Hence we present again two different choices in order to illustrate its 
influence, see Tableland we further call these configurations as Inverse Expert (InvExp) and 
Inverse Expert II (InvExpII). 

In order to reduce the influence of the expert choice, the principal components a computed 
as described in the previous section can be used as the ANN’s inputs and one has to choose only 
their number. To compare the information contained in the same number of inputs selected by an 
expert, we have chosen the same number of principal components as the number of inputs in the 
Inverse Expert configuration and thus <£}™ VCA ' p i = {((o^i,..., a^g), pij) | z G {1,2,..., A tla in}} and 
~ 100- The principal components based strategy is further called Inverse PCA (In- 
vPCA). In our preliminary study presented in we have also tested the possibility to choose 
smaller number of PCA-based inputs selected separately for each parameter to be identified ac¬ 
cording to the sensitivity analysis. Nevertheless, such sensitivity-driven reduction of PCA-based 
inputs was shown to deteriorate the quality of trained ANNs. 

Then, the last preparatory step concerns the generation of testing data for a final assessment 
of the resulting ANNs consisting of A tes t = 50 simulations for randomly generated sets of input 
parameters. The obtained data are then processed by particular approximation strategies in the 
same way as the training data described above. 


7. Neural network training algorithm and topology choice 


The quality of the ANN-based approximation estimated on a given data set D can be ex¬ 
pressed as the mean relative prediction error e MRP (£)) given as 


e!®| i Oi - t ud \ 


e MRP (£>) = 


|£>K7 ma x,O train - T m i n ,O train ) ’ 


( 10 ) 


where 0, is the ANN’s output corresponding to the target value T, d contained in the data set D, 
which consists of \D\ samples. r max ,£) ttain and T mi „jg min are the maximal and minimal target values 
in the training data set fD tra i n , so the error e MRP (D) is always scaled by the same factor for any 
chosen data set 'D and this factor corresponds to the range of the training data. 
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The conjugate gradient-based method [39] was applied as a training algorithm for synaptic 
weights computation and the cross-validation method was employed to determine the number of 
hidden neurons. In V-fold cross-validation we break the training data set © tra ; n into V approxi¬ 
mately equisized subsets ©train = ©train.i U ©train,2 U ■ • ■ U ©train,v and then we perform V training 
processes, each time leaving out one of the subsets ©train,/ and using the rest of the training data 

Set ©train \ ©train,i- 

The criterion for stopping the training process is governed by the prediction errors ratio r PE 
computed at the Ar-th iteration of the training algorithm given as 


Ht (©train \ ©train.i) = 


2^y^ RP (Arain\©train,i) 

yk-J -1 MRP/<7-) , \ ’ 

Zj j=k-2J y V-Strain \ -strain, 1 / 


(ii) 


where e I ) 4RP (©train \ ©train.i) is the mean relative prediction error obtained at the /-th iteration of 
the training algorithm obtained on the training data set without its /-th partition. J is the chosen 
number of iterations considered for computing the ratio r PE for its smoothing effect on r PE . The 
training process is stopped either when the number of iterations achieves its chosen maximal 
value K or if the prediction errors ratio r PE exceeds a chosen critical value r PE x . 

Once the training process is completed, the ANN is evaluated on the remaining part of the 
training data ©train.i, which was not used in the training process. The quality of the ANN with 
a particular number of hidden neurons h is assessed by the cross-validation error « EV , which is 
computed as a mean of the errors obtained for the ANNs trained on the subsets © tra in \ ©train.i 
and then evaluated on the remaining subset ©train.i, i.e. 


„.cv 


~y 


MRP/ (T\ \ 

^ \-Strain,!/ • 


( 12 ) 


We start with an ANN having // m i n hidden neurons and we compute the corresponding cross- 
validation error. Then, one hidden neuron is added and after all the training processes on training 
data subsets, the new cross-validation error is evaluated. We compute the cross-validation error 
ratio r^ VE as 


r CVE 

r h 

-CVE, 


= E k 


We count the situations when the ratio r EVh exceeds a chosen critical value r] nax 


H / 4 -V ( 13 ) 

CVE . If this happened 

W times, the addition of hidden neurons is stopped. Then we choose the architecture having the 
smallest cross-validation error « EV and the particular ANN with the synaptic weights having the 
smallest training error e MRP . 


Number of subsets in cross-validation 

V 

10 

Number of iteration considered in /- PE 

J 

100 

Maximal number of training iterations 

K 

5000 

Maximal value of prediction errors ratio 

r PE 

'max 

0.999 

Starting value of hidden neurons 

^min 

1 

Maximal value of cross-validation error ratio 

r CVE 

0.99 

Maximal value of r^ a E exceeding 

w 

3 


Table 3: Parameters of ANN training algorithm and cross-validation method 
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The resulting ANNs are tested on an independent testing data set tD te st- Since some of the 
approximation strategies consist of a high number of ANNs, the resulting number of hidden 
neurons and achieved errors on training and testing data for all the trained ANNs are listed in 
Appendix A Brief summary of these results is presented in Table Idjjl 


Strategy 

h 

e MKP (£>train)[%] 

e MKP (£>te S t)[%] 

Forward Complex 

7 

2.03 

2.67 

Forward Split 

3 to 10 

0.06 to 1.06 

0.06 to 1.27 

Forward Split II 

4 to 13 

0.06 to 1.42 

0.07 to 2.04 

Forward Split III 

3 to 13 

0.03 to 1.50 

0.03 to 1.98 

Error F\ 

10 

0.40 to 0.54 

0.57 to 0.74 

Error Ft 

9 to 11 

0.78 to 1.36 

0.96 to 1.56 

Inverse Expert 

5 to 8 

1.14 to 5.74 

1.31 to 6.43 

Inverse Expert II 

4 to 6 

1.38 to 5.79 

1.36 to 6.52 

Inverse PCA 

4 to 8 

0.28 to 10.50 

0.33 to 16.73 


Table 4: Architecture of particular ANNs in inverse strategies and their errors on training and 
testing data. 


Regarding the number of hidden neurons, the results point to higher complexity of the error 
function relationships. Nevertheless, the differences in hidden neurons among particular strate¬ 
gies are relatively small. 

The quality of the resulting ANNs in approximation of the given relationships is measured by 
the obtained errors on all the training e MRP (£) train ) and testing e MRP (S t est) data. Small differences 
between the training and testing errors refer to well-trained ANNs and to the good quality of the 
training method as well as the method for topology estimation. Note that overtrained ANNs 
usually lead to significantly higher errors on testing data. 

Comparing the approximation quality of the particular strategies, we can point out good 
results of the forward model approximation and error function approximation, where the errors 
did not exceed the value of 3 %. The good approximation of the forward model is not surprising 
since the relationship is well-defined, smooth and relatively simple. The good results of the error 
function approximation are more unexpected, because the relationship here is probably more 
nonlinear and complex. One possible explanation is a large spread of error function values on 
the training data, which is used to scale the errors (see Eq. (ITOlO. While the error functions 
converge to zero near the optimal parameter values, they quickly rise to extremely high values 
for parameter values more distant from the optimum. Hence, we presume that the small errors 
obtained in the error function approximation do not promise comparably good results in the final 
parameter identification. 

The results of the inverse relation approximation are not very good, but it was foreseen due 
to unknown and probably ill-posed relationship. Nevertheless, the obtained errors are actually 
the final errors of the whole identification process for the training and testing data, since there is 
no other following step concerning any optimisation as in the case of other identification strate¬ 
gies. Hence, further comments on these results are presented in the following section concerning 
verification of the overall identification strategies on the testing data. 


'The error function approximation strategies are intrinsically related to particular experimental curve, 
here are obtained for experimental ’’Mokra” data described in Section[9]in more detail. 
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The results 









8. Verification of model calibration 


Since the errors in Table [4] represent only the quality of the constructed ANNs, we have to 
also investigate the quality of the identification procedures. This section is devoted to verification 
of the model calibration, where the goal is to predict the model parameters’ values corresponding 
to the simulated data, which are not perturbated by any noise. The advantage of verification is 
that we also know the true values of the parameters and thus, we can easily evaluate the quality of 
their estimation by each strategy. In particular, the calibration strategies were applied to estimate 
the parameters’ values for all the training and testing simulations. 

As mentioned, in case of the inverse relation approximation, the outputs of ANNs are directly 
the predicted values of the identified parameters ~p. In case of the forward model approximation, 
we have to run a subsequent optimisation process. Here, the evolutionary algorithm GRADE, 
see 0 for details about this methocQ, is applied to find a set of parameters’ values ~p minimising 
the square distance 6 between components of the model response and their corresponding 
ANN-based approximated counterparts 5^, i.e. 

6 = Y J (a k -a k ) 2 , (14) 

k 


where k corresponds to the selected approximated components defined for particular identifica¬ 
tion strategies in Table [2] In such a way, the parameters 'p are predicted for all the training as 
well as testing data. As the true values of parameters p are known in the verification process, the 
mean prediction errors ~s are computed relatively to the spread of the training data, i.e. 


<Pj) = 


X/’: PiJ T’i.jl 


|£>l(Pmax(:D tra i n )j “ P , j) 


and the obtained errors for particular identification strategies are listed in Table 0 


(15) 



s(p l) 
train 

test 

e(Pi) 

train 

test 

s(Pi) 

train 

test 

s(P4) 

train 

test 

s(a) 

train 

test 

Forward Complex 

16.78 

17.09 

52.20 

47.91 

6.06 

5.45 

3.67 

2.69 

1.079 

1.088 

Forward Split 

9.48 

11.62 

30.18 

38.45 

3.14 

4.65 

1.17 

3.10 

0.310 

0.370 

Forward Split II 

5.09 

6.47 

13.34 

15.03 

1.69 

2.60 

0.67 

1.02 

0.144 

0.205 

Forward Split III 

4.12 

4.84 

10.73 

10.65 

1.49 

1.63 

0.57 

0.64 

0.124 

0.160 

Inverse Expert 

5.74 

6.43 

5.15 

6.21 

1.99 

2.16 

1.14 

1.31 

0.490 

0.493 

Inverse Expert II 

5.79 

6.23 

5.60 

6.52 

2.60 

3.18 

1.38 

1.36 

0.444 

0.533 

Inverse PCA 

3.86 

5.10 

10.50 

16.73 

1.25 

1.89 

0.28 

0.33 

0.377 

1.209 


Table 5: Results of verification of particular identification strategies in terms of mean relative 
prediction errors e'[%]. Best results are highlighted in bold font. 


In application of the identification strategy to real experimental data, the parameter values 
are not known, but the success of the identification process is quantified by quality of fitting the 
data by the model response obtained for the identified parameters. Hence, the model simulations 


H he parameters of GRADE algorithm were set to pool_rate = 4, radioactivity = 0.33 and crossjimit = 0.1. The 
algorithm was stopped after 10000 cost function evaluations. 
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were performed for all the identified parameter sets and prediction errors if in terms of predicted 
responses a are computed analogously to the Eq. (IT5l) . Their values averaged also over all the 
response components are then listed in Table [5] 

The results for the strategies based on an approximation of the error function are missing 
here, because they require to build a particular ANN for every curve of the hydration degree and 
for each require to run an additional minimisation procedure. This is overwhelming and thus 
these strategies are only validated on the experimental data as described in the following section. 

One can see that among the forward strategies, the complex variant provided the worst results 
in the training process as well as in the final identification. The complex relationship covering 
the time domain causes apparently certain difficulties to the training process. We can conclude 
that training of a set of neural networks means more work, but offers significantly better quality 
of the model approximation. We can also point out the large differences in errors of particular 
parameters, which correspond to influence of particular parameters to the model response. As 
demonstrated in Figure[3] the largest spread of the model response is related namely to change in 
the parameters p 4 and /? ?, while the parameter p\ and also p 2 seem to be almost negligible. The 
sensitivity analysis illustrated in Figure |4j) shows very high sensitivity of the model response to 
the parameter pn at early stage of hydration, nevertheless, at this stage the spread of the model 
response is almost negligible and even a very small error in the response approximation can 
be fatal for identification of the parameter pi. On the other hand, it is not surprising that the 
identification accuracy is significantly improved with an increasing number of approximated 
response components, i.e. an increasing number of trained ANNs. 

Despite the worse results in training of ANNs, the inverse strategies achieved comparably 
good results with the forward strategies in parameter identification and also in fitted measure¬ 
ments. More precisely, the results of measurements fitting are slightly worse, but the errors in 
parameter prediction are smaller. Especially the Inverse Expert strategies provided surprisingly 
small errors in p 2 prediction and the errors in parameters are generally more balanced. This phe¬ 
nomenon can be possibly explained by fact that each ANN is trained to predict each parameter 
separately, thus automatically selecting and emphasizing the combinations of the model response 
critical for the parameter. In the strategy Inverse Expert II, the usage of one additional input at 
the early stage of hydration caused no improvement of the resulting prediction, which is prob¬ 
ably caused again by fact that the responses at this stage have a negligible spread and almost 
no predictive value. The last interesting result concerns the application of principal component 
analysis. The Inverse PCA strategy provided again significantly different errors in prediction of 
particular parameters, similarly to the forward strategies. The reason resides possibly in fact that 
PCA emphasize the most important components, while it can mix the effects of the less signifi¬ 
cant parameters. Nevertheless, when compared with strategies Forward Split and Inverse Expert 
using the same number of response components, the Inverse PCA provided the best results in 
prediction of all the parameters except p 2 . Its quality of measurement fitting is, however, the 
worst among those strategies. 

From this thorough comparison we may conclude that all the inverse strategies provide very 
good results, which makes them highly promising considering their very simple implementation 
which does not include any additional optimisation process except the only training of ANNs. 
Moreover, the Inverse Expert strategies can be especially recommended for identification of less 
significant parameters. 
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9. Validation of model calibration 


The previous section was focused on mutual comparison of the presented identification strate¬ 
gies on simulated data. However, a complete comparison has to include their validation on 
experimental data. To that purpose we used the four experimental data obtained by isother¬ 
mal calorimetry: one for cement “Mokra” CEM I 42.5 R taken directly from Heidelberg ce¬ 
ment group’s kiln in Mokra, Czech Republic II35| and three others from the following literature: 
“Boumiz” 0|], “Hua” |41] and “Princigallo” Toi . 

In parameter identification from experimental data, one often face to difficulties related to (i) 
experimental errors and (ii) model imperfections. Especially in case of models with parameters 
having a specific physical meaning - like the affinity hydration model - it happens that the 
experimental data seems to lie beyond the physically meaningful values of the model parameters. 
This is exactly what we face in case of the four experimental curves depicted in Figure [6] The 




correction: 0.5 h 


correction: 1 h 




correction: 4.5 h 


correction: 3.5 h 


Figure 6: Corrections of experimental curves. 


grey curves represent the training samples generated in an optimised fashion so as to maximally 
cover the parameter space. Nevertheless, it is visible that all the experimental curves lie out 
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of the bundle of the training samples. Applying the identification strategies to these data will 
require the ANNs to extrapolate and it will probably lead to unphysical and wrong predictions 
of the model parameters. Such results were presented for “Mokra” in d. Looking in more 
detail on the experimental curves, one can see that the difference between the experimental data 
and simulations can be explained by wrong estimation of the origin of hydration. Correction of 
the starting time moves the curves into the bundle of response simulations. As a matter of fact, 
the correction in orders of hours is negligible comparing to the duration of the whole hydration 
process lasting often days or weeks. Moreover, the goal of this paper is not to argue against the 
correctness of the model or data, but to demonstrate the properties of particular identification 
strategies which can be better illustrated in a situation, where the observed data are not outliers 
w.r.t. sampled parameter domain. For an interested reader about the identification of outliers we 
refer to [llj]. 

In general, validation does not allow for a comparison in terms of parameters’ values, be¬ 
cause these are not known a priori. Nevertheless, the simplicity and the fast simulation of the 
affinity hydration model permit a direct optimisation of the model parameters so as to fit the 
measured data without any incorporated approximation. The resulting optimal solutions can be 
then compared with the results obtained using the ANN approximations. To that purpose, we 
employ again the error functions given in Eqs. ([3]) and <(4]) and the GRADE algorithm with the 
same setting as in the previous section to minimise the both error functions. The obtained results 
are referred to as Directl and Direct2, respectively, and they represent the best results that can 
be achieved with the current model on the given data. 

Subsequently, the identification strategies were applied to the experimental data using the 
prepared ANNs. Since the ANNs are constructed for specific time steps of the hydration degree, 
the experimental curves are interpolated to the time steps required by the particular ANNs. If 
necessary, the data are extrapolated beyond the last measured time step assuming the further 
progress of hydration to be constant at the last measured value. The identified parameters to¬ 
gether with the parameters’ values obtained by the direct optimisation are written in Tables [6] 
and [7] Note that the parameter values highlighted in bold font refer to situation, where the mea¬ 
sured data lie beyond the domain of training data and the ANN is forced to extrapolate. The 


Method 

“Mokra” 

P\ P2 

Pi 

P4 

e(a) 

“Boumiz” 

P\ P2 

Pi 

P4 

e(a) 

Directl 

0.84 

0.99 

0.18 

0.05 

0.70 

0.93 

1.00 

0.02 

0.36 

2.37 

Direct2 

0.82 

0.98 

0.18 

0.05 

0.65 

0.93 

1.00 

0.02 

0.35 

2.70 

Forward Complex 

0.81 

1.00 

0.18 

0.03 

1.35 

1.00 

0.61 

0.08 

0.36 

12.67 

Forward Split 

0.82 

1.00 

0.19 

0.05 

1.15 

0.96 

1.00 

0.08 

0.35 

5.44 

Forward Split II 

0.78 

1.01 

0.18 

0.05 

0.83 

1.00 

1.00 

0.08 

0.35 

4.11 

Forward Split III 

0.80 

1.00 

0.19 

0.05 

0.91 

0.98 

1.00 

0.05 

0.35 

3.03 

Error F\ 

0.78 

0.73 

0.09 

0.07 

3.89 

- 

- 

- 

- 

- 

Error F 2 

1.00 

1.19 

0.15 

-0.06 

2.73 

- 

- 

- 

- 

- 

Inverse Expert 

1.16 

-0.18 

0.29 

0.03 

6.83 

0.78 

-0.24 

0.22 

0.30 

35.11 

Inverse Expert II 

1.21 

-0.06 

0.19 

0.16 

4.68 

1.27 

-0.14 

0.20 

0.13 

25.94 

Inverse PCA 

0.75 

0.83 

0.18 

0.06 

1.82 

0.78 

0.87 

0.02 

0.35 

10.82 


Table 6: Results of identification strategies obtained for “Mokra” and “Boumiz”: identified val¬ 
ues of model parameters and mean relative error in degree of hydration s(a) [%]. 
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Method 

“Hua’ 

Pi 

P2 

P3 

PA 

s(a) 

“Princigallo” 

Pi P2 

P3 

Pa 

e(a) 

Directl 

1.00 

0.94 

0.20 

0.11 

2.24 

1.00 

0.85 

0.19 

0.14 

3.46 

Direct2 

0.99 

0.96 

0.21 

0.11 

2.46 

1.00 

0.88 

0.21 

0.15 

3.27 

Forward Complex 

1.00 

0.64 

0.22 

0.08 

4.10 

1.00 

0.58 

0.23 

0.14 

6.21 

Forward Split 

0.87 

1.00 

0.19 

0.11 

2.84 

0.78 

0.98 

0.18 

0.15 

4.39 

Forward Split II 

0.93 

0.96 

0.21 

0.11 

2.92 

0.92 

0.82 

0.20 

0.14 

4.44 

Forward Split III 

0.87 

1.01 

0.18 

0.10 

2.71 

0.89 

0.92 

0.18 

0.14 

3.75 

Inverse Expert 

0.94 

-0.29 

0.26 

0.12 

10.64 

1.07 

-0.16 

0.22 

0.15 

9.02 

Inverse Expert II 

1.26 

-0.27 

0.19 

0.02 

6.23 

1.52 

-1.38 

0.13 

-0.24 

15.05 

Inverse PCA 

1.00 

0.89 

0.15 

0.12 

2.41 

1.13 

0.74 

0.19 

0.15 

3.62 


Table 7: Results of identification strategies obtained for “Hua” and “Princigallo”: identified 
values of model parameters and mean relative error in degree of hydration ~s(a) [%]. 


identified parameters were used as inputs for simulations, whose results are compared with the 
experimental data in Figures [7] and [8] To quantify the quality of obtained fits. Tables [6] and [7] 
contain also the mean relative error 7T(cf) [%] computed in the same manner as in Table [5] for an 
easy comparison of the verification and validation results. 

The strategies based on the error function approximation are illustrated on the parameter 
identification from “Mokra” data, which are used to define the error functions, which are ap¬ 
proximated by the ANNs. The trained ANNs are then optimised by the GRADE algorithm so as 
to provide the optimal set of the identified parameters. As we presumed, the identification results 
are not satisfactory despite very good results of the ANNs’ training processes, see Table [4] The 
training and testing errors are small relatively to the spread of error functions’ values, which in¬ 
crease quickly with the distance from the optimal solution. The strategy, however, requires high 
precision of the ANN’s approximation near the optimal solution, which can be hardly achieved 
due the overall complex shape of the error functions. 

The worst results on all the experimental curves were obtained by the inverse strategies based 
on selected components of the model response used as the ANNs’ inputs. The results pointed 
out the high sensitivity of this strategy to the measurement noise and to the specific choice of the 
inputs. Both drawbacks are overcome by employing principal component analysis, which allows 
to employ a high number of the response components and filter the measurement noise out of the 
several first principal components. The Inverse PCA strategy thus achieved significantly better 
results. 

The forward strategies provided generally the best results consistent with the results of veri¬ 
fication on the simulated data. These strategies thus proved to be rather immune to the noise in 
experimental data. 

10. Conclusions 

The presented paper reviews and compares several possible applications of artificial neu¬ 
ral networks in calibration of numerical models. In particular, the feedforward layered neural 
network is employed in three basic schemes to surrogate: (i) response of a model, (ii) inverse re¬ 
lationship of model parameters and model response and (iii) error function quantifying how well 
the model response fits the experimental data. Their advantages and drawbacks are illustrated on 
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Figure 7: Comparison of corrected experimental data “Mokra” and “Boumiz” and corresponding 
results of calibration strategies. 


calibration of four parameters of the affinity hydration model. The model is chosen for its nonlin¬ 
earity, difference in sensitivities to particular parameters on one hand and simplicity and very fast 
numerical evaluation on the other. The later allow for the model calibration based on the stochas¬ 
tic evolutionary algorithm without any involved approximation and thus better quantification of 
calibration results provided by the particular strategies. The investigated calibration strategies 
are verified on 50 simulated curves of hydration degree and validated on four experimental ones. 

Simplified summary of the obtained results is written in Table [8] One of the simplest strate¬ 
gies from the implementation point of view is based on an approximation of the error function 
(Error F ), where only one neural network needs to be trained for the prediction of the error 
function values. This simplicity, however, does not hold in case of multiple experimental mea¬ 
surements, where the whole identification process including the neural network training as well 
as its optimisation needs to be done all over again for any new experiment. Moreover, the pre¬ 
sented examples revealed that the complexity of the error function may cause difficulties for 
neural network training resulting in high errors in the identified parameters. The potential of the 
neural network is wasted on approximating the whole domain, while the accurate predictions are 
required only in the vicinity of the optimal values of parameters. Hence, this strategy is more 
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Figure 8: Comparison of corrected experimental data “Hua” and “Princigallo” and corresponding 
results of calibration strategies. 


Strategy 

-Nann 

optimisation 

new data 

errors 

Forward Complex 

1 

yes 

optimisation 

middle 

Forward Split 

N a 

yes 

optimisation 

low 

Error F 

1 

yes 

traininng + optimisation 

high 

Inverse Expert 

N v 

no 

- 

high 

Inverse PCA 

Np 

no 

- 

middle 


Table 8: Simplified summary of calibration strategies. N a stands for a number of approximated 
components of model response, N p is a number of model parameters. 


suited for surrogate models based on radial basis function networks or kriging, which can be 
trained along with the optimisation of the error function thus allowing to improve the precision 
in the promising area, see e.g. |,26ll. 

An equally simple strategy is based on the approximation of the model response, where time 
or space variables are included among the neural network inputs (Forward Complex). This strat¬ 
egy is better suited for layered neural networks, which is trained only once and then can be used 
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repeatedly for any new observations. The effort invested into the approximation of the whole do¬ 
main is thus not wasted. The application to new data requires only one new optimisation process. 
The results obtained by this strategy were not excellent, but can be considered as satisfactory so¬ 
lution at a low price. 

The best results were achieved by separate approximations of particular response compo¬ 
nents, where a higher number of neural networks is trained to approximate rather simple rela¬ 
tionship defined by the calibrated model (Forward Split). This procedure requires more work on 
networks preparation, which is compensated by high accuracy of the obtained results. The ac¬ 
curacy is proportionally increasing with the number of approximated response components and 
can be thus influenced by work invested to the surrogate construction. Moreover, the constructed 
approximations can be then used again for any new data, where only the optimisation of model 
parameters needs to be repeated. 

The worst results were obtained by the strategy approximating the inverse mapping from 
the response components to the model parameters (Inverse Expert). Such relationship does not 
have to exist and can be hardly approximated. Moreover, if the inputs for a neural network are 
not properly selected and thus highly sensitive to the measurement error, the procedure provides 
unsatisfactory results. Nevertheless, using an expert knowledge for a proper selection of inputs 
as presented in [283, this strategy gives good results at a very low price, since neither training 
nor optimisation process, but only a simple evaluation of the trained networks is needed for 
parameter identification from new data. 

The necessity of the expert knowledge and sensitivity to the measurement error can be easily 
circumvented by employing principal component analysis on the model response components 
(Inverse PC A). Then only the number of components entering as inputs in the neural network 
needs to be selected. The strategy thus represents a compromise solution providing satisfactory 
results at a low price especially in the repeated application to new observed data. 
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Appendix A. Configurations and results of particular neural networks 

The particular choice of ANN inputs and outputs are presented in Tables IA.9I and IA.1 II for 
forward and inverse mode strategies, respectively. 
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Strategy 

Inputs 

h Output 

^(©trai „)[%] 

e MKt, (£>,e S t)[%] 

Forward Complex p\, pi, Pi, P 4 , 4 

7 a k 

2.03 

2.67 

Forward Split 

Pi, P2, Pi, P4 

7 a 300 

0.06 

0.06 


Pi, P2, Pi, P4 

8 0-400 

0.07 

0.12 


Pi, P2, Pi, P4 

7 0-500 

0.08 

0.11 


Pi, P2, Pi, P4 

2 a-600 

0.62 

1.01 


Pi, P2, Pi, P4 

7 a700 

0.79 

1.01 


Pi, P2, Pi, P4 

6 o-soo 

1.06 

1.27 


Pi, P2, Pi, P4 

8 0-900 

0.28 

0.32 


Pi, P2, Pi, P4 

10 aiooo 

0.22 

0.27 


Pi, P2, Pi, P4 

9 o'noo 

0.21 

0.30 

Forward Split II 

Pi, P2, Pi, P4 

7 a-ioo 

0.52 

0.88 


Pi, P2, Pi, P4 

4 0-150 

0.86 

1.39 


Pi, P2, Pi, P4 

4 0-200 

0.08 

0.11 


Pi, P2, Pi, P4 

8 0-250 

0.68 

0.80 


Pi, P2, Pi, P4 

4 0-300 

0.44 

0.60 


Pi, P2, Pi, P4 

5 Q'350 

0.48 

0.95 


Pi, P2, Pi, P4 

4 0-400 

0.06 

0.07 


Pi, P2, Pi, P4 

6 0-450 

0.07 

0.10 


Pi, P2, Pi, P4 

6 0-500 

0.07 

0.13 


Pi, P2, Pi, P4 

9 0-550 

0.15 

0.22 


Pi, P2, Pi, P4 

6 a-600 

1.42 

2.04 


Pi, P2, Pi, P4 

5 0-650 

0.84 

1.19 


Pi, P2, Pi, P4 

6 0-700 

0.55 

0.73 


Pi, P2, Pi, P4 

8 »750 

0.60 

1.18 


Pi, P2, Pi, P4 

7 0-800 

0.46 

0.62 


Pi, P2, Pi, P4 

9 0-850 

0.75 

1.09 


Pi, P2, Pi, P4 

7 a-900 

0.20 

0.23 


Pi, P2, Pi, P4 

13 0-950 

0.28 

0.43 


Pi, P2, Pi, P4 

9 o-iooo 

0.73 

1.16 


Pi, P2, Pi, P4 

6 Q-1050 

0.10 

0.17 


Pi, P2, Pi, P4 

9 o-iioo 

0.10 

0.19 


Pi, P2, Pi, P4 

7 0-1150 

0.08 

0.14 

Error F\ 

Pi, P2, Pi, P4 

10 Ei for Mokra 

0.54 

0.74 


Pi, P2, Pi, P4 

10 Ei for shifted Mokra 

0.40 

0.57 

Error F 2 

Pi, P2, Pi, P4 

9 E 2 for Mokra 

0.78 

0.96 


Pi, P2, Pi, P4 

9 E 2 for shifted Mokra 

1.36 

1.56 


Table A.9: Architecture of particular ANNs constructed in forward strategies and their errors on 
training and testing data. 
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Strategy 

Inputs 

h 

Output 

e MKt, (©,rai„)[%] 

e MKt, (£>,e S t)[%] 

Forward Split III 

Pu P2, P3, PA 

7 

Q'lOO 

0.07 

0.08 


Pl,P2,P3,PA 

4 

am 

0.56 

0.45 


P\, P2, P3, Pa 

4 

a 150 

1.03 

0.95 


Pl,P2, P3,PA 

8 

a no 

0.08 

0.06 


PuP2, P3,PA 

4 

a 200 

0.76 

0.74 


PuP2, P3,PA 

5 

a 230 

0.44 

0.41 


Pl,P2, P3,PA 

4 

a 250 

0.49 

0.45 


Pl,P2, P3,PA 

6 

a2io 

0.07 

0.06 


Pl,P2, P3,PA 

6 

0300 

0.07 

0.05 


Pl,P2, P3,PA 

9 

0330 

0.07 

0.11 


Pl,P2, P3,PA 

6 

0350 

0.16 

0.37 


Pl,P2, P3,PA 

5 

0310 

1.47 

1.90 


Pl,P2, P3,PA 

6 

OA00 

0.84 

1.06 


PUP2, P3,PA 

8 

OA30 

0.59 

0.96 


Pl,P2, P3,PA 

7 

a 450 

0.71 

0.92 


Pl,P2, P3,PA 

9 

OA10 

0.54 

0.55 


Pl,P2, P3,PA 

7 

0500 

0.89 

0.98 


Pl,P2, P3,PA 

13 

0530 

0.23 

0.40 


Pl,P2, P3,PA 

9 

0550 

0.30 

0.44 


Pl,P2, P3,PA 

6 

0510 

0.73 

0.63 


Pl,P2, P3,PA 

9 

06 00 

0.12 

0.20 


Pl,P2, P3,PA 

7 

063 0 

0.11 

0.18 


Pl,P2, P3,PA 

7 

<*650 

0.07 

0.08 


PUP2, P3,PA 

4 

0610 

0.55 

0.49 


Pl,P2, P3,PA 

6 

Ol00 

0.07 

0.09 


Pl,P2, P3,PA 

8 

0130 

0.06 

0.06 


Pl,P2, P3,PA 

9 

0150 

0.07 

0.06 


Pl,P2, P3,PA 

8 

Ono 

0.03 

0.03 


Pl,P2, P3,PA 

8 

tfsoo 

0.05 

0.04 


Pl,P2, P3,PA 

5 

£*830 

0.09 

0.10 


Pl,P2, P3,PA 

5 

£*850 

0.77 

0.42 


PUP2, P3,PA 

3 

a%io 

0.23 

0.27 


PUP2, P3,PA 

6 

0600 

1.06 

0.99 


Pl,P2, P3,PA 

7 

0930 

1.50 

1.88 


Pl,P2, P3,PA 

8 

0950 

0.37 

0.49 


Pl,P2, P3,PA 

7 

0910 

1.38 

1.98 


P\,P2, P3,PA 

7 

a iooo 

0.93 

1.05 


Pl,P2, P3,PA 

8 

Q'1030 

0.26 

0.35 


Pl,P2, P3,PA 

7 

t*ioso 

0.83 

0.87 


Pl,P2, P3,PA 

6 

O'1070 

1.12 

1.04 


Pl,P2, P3,PA 

8 

O'lioo 

0.31 

0.36 


Pl,P2, P3,PA 

11 

O\\30 

0.13 

0.20 


Pl,P2, P3,PA 

7 

«1150 

0.14 

0.20 


Table A. 10: Architecture of particular ANNs constructed in forward strategies and their errors 
on training and testing data. 
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Strategy 

Inputs 

h 

Output e MKt '(£) train )[%] 

e MKt, (£>,e S t)[%] 

Inverse Expert 

9 values: 0 - 300 , <* 400 > • • •, £*1100 

5 

Pi 

5.74 

6.43 


9 values: 0300 , 0 - 400 , • • • ,£*1100 

7 

Pi 

5.15 

6.21 


9 values: 0300 , 0 - 400 , • • •, Q -1100 

8 

Pl 

1.99 

2.16 


9 values: 0300 , a- 400 , •••, Q -1100 

5 

Pa 

1.14 

1.31 

Inverse Expert II 

10 values: 0200 , Q- 300 , Q -1100 

5 

Pi 

5.79 

6.23 


10 values: 0200 , Q- 300 , • • •, Q -1100 

4 

Pi 

5.60 

6.52 


10 values: 0200 , Q- 300 , •••, Q -1100 

6 

Pi 

2.60 

3.18 


10 values: 0200 , Q- 300 , •••, Q -1100 

5 

PA 

1.38 

1.36 

Inverse PCA 

9 values: a\, d- 2 ,.. .,ag 

6 

Pi 

3.86 

5.10 


9 values: ai, d- 2 ,..., ag 

4 

Pi 

10.50 

16.73 


9 values: ai, cb ,.. .,ag 

8 

Pi 

1.25 

1.89 


9 values: an, cb ,.. .,ag 

8 

PA 

0.28 

0.33 


Table A. 11: Architecture of particular ANNs in inverse strategies and their errors on training and 
testing data. 
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